Skip to content

Conversation

Diveyam-Mishra
Copy link

Summary
Records iframe interactions reliably without flooding steps.
Replays deterministic steps inside the correct iframe instead of navigating into the iframe URL.
Filters ad/analytics/recaptcha iframe navigations and about:blank.
Deduplicates steps (including scroll/input/click) to match real user intent.

Extension: Content (src/entrypoints/content.ts)
Enabled all frames and about:blank: allFrames: true, matchAboutBlank: true.
Added frame context on all events: frameUrl, frameIdPath, isTopFrame.
Input noise reduction: isTrusted checks, debounce per element, skip rapid empty repeats.
New-tab intent signal: PREPARE_NEW_TAB emitted on modifier/middle-clicks and target=_blank.

Extension: Background (src/entrypoints/background.ts)
Activated tab gating + new-tab intent correlation; ignore unactivated/background tabs.
Iframe nav filtering:
Tracks interactedFrameUrls per tab; only accepts rrweb Meta navigations from frames the user interacted with.
Drops about:blank; adds time-window gating (defaults to 3s after interaction).
Allow/deny logic (allowlist overrides blocklist), with settings persisted.
rrweb payload handling:
Scroll: merge consecutive updates; suppress chrome://newtab; dedupe identical XY within 200ms regardless of targetId.
Meta: convert to navigation step only if allowed by the above rules.
Step dedupe post-pass: collapse consecutive duplicates for navigation/input/click/scroll/key_press; update timestamp/screenshot instead of appending.
Navigation consolidation: per-tab merge of redirects to reduce churn.
Persisted recording state: default stopped; load/broadcast on startup; save on start/stop.

Extension: Types/Schema
types.ts: Added optional frameIdPath to StoredCustom* and frameUrl to rrweb.
workflow-types.ts: Promoted frameIdPath to Step types (Click/Input/KeyPress/Navigation/Scroll).

Backend (workflows/workflow_use)
Recorder safety filter (recorder/service.py): drop about:blank and obvious ad/analytics hosts before saving workflow.
Controller models (controller/views.py): RecorderBase now includes url and frameIdPath so steps carry frame/page hints.
Deterministic controller (controller/service.py):
Clicks execute inside the correct iframe:
Resolve context by frameIdPath (preferred) or frameUrl (origin+prefix).
Do not navigate to iframe URLs; only auto-navigate for top-document clicks without frame hints.
Fallback: search all frames (prioritize target origin) to find selector.
Increased click timeout to 2500ms.
Workflow runner (workflow/service.py):
Skip “pre-wait for next selector” when next step’s declared url/frameUrl differs from current page URL to avoid false failures between pages.
Noise/Correctness

Prevent duplicate steps due to pointer-only variance and consecutive identical content.
Drop spurious iframe navigations (recaptcha/ads/metrics).
State is consistent across reloads; recording does not auto-start unexpectedly.

Additional Notes
Frame resolution uses frameIdPath first; frameUrl fallback matches origin + prefix to disambiguate.

…hild tab steps missing.

Background/ad/tracker tabs polluted logs.
Excessive duplicate navigation events per redirect/loading cycle.
Massive explosion of input steps (hundreds of empty, unchanged values).
Unnecessary workflow updates when steps unchanged.

New Tab Intent Heuristic:

Content script emits PREPARE_NEW_TAB on ctrl/cmd/middle click or target=_blank.
Background correlates upcoming chrome.tabs.onCreated to mark userInitiated.
Activated tabs tracked; only activated or userInitiated tabs produce steps.
Tab Filtering:

Suppress all events (except activation) from tabs never activated and not correlated with an intent window (4s).
Reduces noise from ads/trackers.
Navigation Consolidation:

Maintain lastNavigationIndexByTab; update existing navigation step instead of appending duplicates during rapid redirects or title/url churn.
Input Event Deduplication:

Content script: per-xpath cache; skip unchanged value; debounce; skip rapid empty repeats.
Background: merge consecutive identical field edits; collapse bursts of empty values within 5s (timestamp refresh only).
Track lastInputPerKey (tabId|xpath) to decide merge vs new step.
@sauravpanda
Copy link
Collaborator

can you quickly make a video in action and post here?

@Diveyam-Mishra
Copy link
Author

Workflow.1st.PR.mp4

Here's a short video

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants